Predicted Residual Error Sum of Squares of Mixed Models: An Application for Genomic Prediction
نویسنده
چکیده
Genomic prediction is a statistical method to predict phenotypes of polygenic traits using high-throughput genomic data. Most diseases and behaviors in humans and animals are polygenic traits. The majority of agronomic traits in crops are also polygenic. Accurate prediction of these traits can help medical professionals diagnose acute diseases and breeders to increase food products, and therefore significantly contribute to human health and global food security. The best linear unbiased prediction (BLUP) is an important tool to analyze high-throughput genomic data for prediction. However, to judge the efficacy of the BLUP model with a particular set of predictors for a given trait, one has to provide an unbiased mechanism to evaluate the predictability. Cross-validation (CV) is an essential tool to achieve this goal, where a sample is partitioned into K parts of roughly equal size, one part is predicted using parameters estimated from the remaining K - 1 parts, and eventually every part is predicted using a sample excluding that part. Such a CV is called the K-fold CV. Unfortunately, CV presents a substantial increase in computational burden. We developed an alternative method, the HAT method, to replace CV. The new method corrects the estimated residual errors from the whole sample analysis using the leverage values of a hat matrix of the random effects to achieve the predicted residual errors. Properties of the HAT method were investigated using seven agronomic and 1000 metabolomic traits of an inbred rice population. Results showed that the HAT method is a very good approximation of the CV method. The method was also applied to 10 traits in 1495 hybrid rice with 1.6 million SNPs, and to human height of 6161 subjects with roughly 0.5 million SNPs of the Framingham heart study data. Predictabilities of the HAT and CV methods were all similar. The HAT method allows us to easily evaluate the predictabilities of genomic prediction for large numbers of traits in very large populations.
منابع مشابه
The R Package groc for Generalized Regression on Orthogonal Components
The R package groc for generalized regression on orthogonal components contains functions for the prediction of q responses using a set of p predictors. The primary building block is the grid algorithm used to search for components (projections of the data) which are most dependent on the response. The package offers flexibility in the choice of the dependence measure which can be user-defined....
متن کاملGenomic Prediction Accounting for Residual Heteroskedasticity
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed sp...
متن کاملPredicting Fault Detection Effectiveness
Regression methods are used to model fault detection effectiveness in terms of several product and testing process measures. The relative importance of these product/process measures for predicting fault detection effectiveness is assessed for a specific data set. A substantial family of models is considered, specifically, the family of quadratic response surface models with two-way interaction...
متن کاملPrediction of Residual Stresses for a Hollow Product in Cold Radial Forging Process
Radial forging is an open die forging process used for reducing the diameters of shafts, tubes, stepped shafts and axles and also for creating internal profiles such as rifling the gun barrels. The radial forging of tube is usually performed over a mandrel to create an internal profile and/or size the internal diameter. Most of the previous studies conducted on the radial forging process have u...
متن کاملتنظیم و کاربرد الگوریتم جنگل تصادفی در ارزیابی ژنومی
One of the most important issues in genomic selection is using a decent method for estimating marker effects and genomic evaluation. Recently, machine learning algorithms which are members of non-parametric and non-linear methods have been extended to genomic evaluation. One of these methods is Random Forest (RF) on which this research was focused. Important parameters in RF algorithm are the n...
متن کامل